Overview
Brought to you by YData
Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 47619 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 44 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 5.4 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 3 |
| Dataset has 44 (0.1%) duplicate rows | Duplicates |
gender is highly overall correlated with relationship | High correlation |
relationship is highly overall correlated with gender | High correlation |
race is highly imbalanced (65.9%) | Imbalance |
workclass has 1426 (3.0%) zeros | Zeros |
marital-status has 6563 (13.8%) zeros | Zeros |
occupation has 5559 (11.7%) zeros | Zeros |
relationship has 19172 (40.3%) zeros | Zeros |
capital-gain has 43674 (91.7%) zeros | Zeros |
capital-loss has 45379 (95.3%) zeros | Zeros |
native-country has 824 (1.7%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-12 08:02:38.185642 |
|---|---|
| Analysis finished | 2025-07-12 08:02:55.458780 |
| Duration | 17.27 seconds |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
age
Real number (ℝ)
| Distinct | 59 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.230664 |
| Minimum | 17 |
|---|---|
| Maximum | 75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 17 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 28 |
| median | 37 |
| Q3 | 47 |
| 95-th percentile | 62 |
| Maximum | 75 |
| Range | 58 |
| Interquartile range (IQR) | 19 |
Descriptive statistics
| Standard deviation | 13.199351 |
|---|---|
| Coefficient of variation (CV) | 0.3452556 |
| Kurtosis | -0.54604325 |
| Mean | 38.230664 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.4425353 |
| Sum | 1820506 |
| Variance | 174.22286 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 36 | 1330 | 2.8% |
| 35 | 1328 | 2.8% |
| 33 | 1312 | 2.8% |
| 23 | 1307 | 2.7% |
| 31 | 1303 | 2.7% |
| 34 | 1294 | 2.7% |
| 30 | 1268 | 2.7% |
| 28 | 1264 | 2.7% |
| 37 | 1254 | 2.6% |
| 38 | 1252 | 2.6% |
| Other values (49) | 34707 |
| Value | Count | Frequency (%) |
| 17 | 590 | |
| 18 | 853 | |
| 19 | 1041 | |
| 20 | 1101 | |
| 21 | 1081 | |
| 22 | 1161 | |
| 23 | 1307 | |
| 24 | 1190 | |
| 25 | 1176 | |
| 26 | 1138 |
| Value | Count | Frequency (%) |
| 75 | 68 | 0.1% |
| 74 | 70 | 0.1% |
| 73 | 102 | |
| 72 | 113 | |
| 71 | 111 | |
| 70 | 130 | |
| 69 | 144 | |
| 68 | 170 | |
| 67 | 232 | |
| 66 | 231 |
workclass
Real number (ℝ)
Zeros 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.041391 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 1426 |
| Zeros (%) | 3.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 3 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.1423469 |
|---|---|
| Coefficient of variation (CV) | 0.37560015 |
| Kurtosis | 1.7659471 |
| Mean | 3.041391 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.15933069 |
| Sum | 144828 |
| Variance | 1.3049565 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 33089 | |
| 5 | 3744 | 7.9% |
| 1 | 3089 | 6.5% |
| 2 | 2639 | 5.5% |
| 6 | 1973 | 4.1% |
| 4 | 1659 | 3.5% |
| 0 | 1426 | 3.0% |
| Value | Count | Frequency (%) |
| 0 | 1426 | 3.0% |
| 1 | 3089 | 6.5% |
| 2 | 2639 | 5.5% |
| 3 | 33089 | |
| 4 | 1659 | 3.5% |
| 5 | 3744 | 7.9% |
| 6 | 1973 | 4.1% |
| Value | Count | Frequency (%) |
| 6 | 1973 | 4.1% |
| 5 | 3744 | 7.9% |
| 4 | 1659 | 3.5% |
| 3 | 33089 | |
| 2 | 2639 | 5.5% |
| 1 | 3089 | 6.5% |
| 0 | 1426 | 3.0% |
fnlwgt
Real number (ℝ)
| Distinct | 27805 |
|---|---|
| Distinct (%) | 58.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 189143 |
| Minimum | 12285 |
|---|---|
| Maximum | 1490400 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 12285 |
|---|---|
| 5-th percentile | 39478 |
| Q1 | 117359.5 |
| median | 177858 |
| Q3 | 236696 |
| 95-th percentile | 378036 |
| Maximum | 1490400 |
| Range | 1478115 |
| Interquartile range (IQR) | 119336.5 |
Descriptive statistics
| Standard deviation | 105421.76 |
|---|---|
| Coefficient of variation (CV) | 0.5573654 |
| Kurtosis | 6.2203837 |
| Mean | 189143 |
| Median Absolute Deviation (MAD) | 60069 |
| Skewness | 1.4540404 |
| Sum | 9.0068003 × 109 |
| Variance | 1.1113748 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 203488 | 21 | < 0.1% |
| 120277 | 19 | < 0.1% |
| 190290 | 19 | < 0.1% |
| 125892 | 18 | < 0.1% |
| 126569 | 18 | < 0.1% |
| 126675 | 17 | < 0.1% |
| 99185 | 17 | < 0.1% |
| 113364 | 17 | < 0.1% |
| 186934 | 16 | < 0.1% |
| 111567 | 16 | < 0.1% |
| Other values (27795) | 47441 |
| Value | Count | Frequency (%) |
| 12285 | 1 | < 0.1% |
| 13492 | 1 | < 0.1% |
| 13769 | 3 | |
| 13862 | 1 | < 0.1% |
| 14878 | 1 | < 0.1% |
| 18827 | 1 | < 0.1% |
| 19214 | 1 | < 0.1% |
| 19302 | 6 | |
| 19395 | 2 | < 0.1% |
| 19410 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1490400 | 1 | |
| 1484705 | 1 | |
| 1455435 | 1 | |
| 1366120 | 1 | |
| 1268339 | 1 | |
| 1226583 | 1 | |
| 1210504 | 1 | |
| 1184622 | 1 | |
| 1161363 | 1 | |
| 1125613 | 1 |
educational-num
Real number (ℝ)
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.217602 |
| Minimum | 4 |
|---|---|
| Maximum | 16 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 9 |
| median | 10 |
| Q3 | 13 |
| 95-th percentile | 14 |
| Maximum | 16 |
| Range | 12 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.3776774 |
|---|---|
| Coefficient of variation (CV) | 0.23270405 |
| Kurtosis | 0.083528899 |
| Mean | 10.217602 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.011172643 |
| Sum | 486552 |
| Variance | 5.6533497 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 15655 | |
| 10 | 10824 | |
| 13 | 7983 | |
| 14 | 2634 | 5.5% |
| 11 | 2053 | 4.3% |
| 7 | 1801 | 3.8% |
| 12 | 1592 | 3.3% |
| 6 | 1373 | 2.9% |
| 4 | 899 | 1.9% |
| 15 | 819 | 1.7% |
| Other values (3) | 1986 | 4.2% |
| Value | Count | Frequency (%) |
| 4 | 899 | 1.9% |
| 5 | 745 | 1.6% |
| 6 | 1373 | 2.9% |
| 7 | 1801 | 3.8% |
| 8 | 654 | 1.4% |
| 9 | 15655 | |
| 10 | 10824 | |
| 11 | 2053 | 4.3% |
| 12 | 1592 | 3.3% |
| 13 | 7983 |
| Value | Count | Frequency (%) |
| 16 | 587 | 1.2% |
| 15 | 819 | 1.7% |
| 14 | 2634 | 5.5% |
| 13 | 7983 | |
| 12 | 1592 | 3.3% |
| 11 | 2053 | 4.3% |
| 10 | 10824 | |
| 9 | 15655 | |
| 8 | 654 | 1.4% |
| 7 | 1801 | 3.8% |
marital-status
Real number (ℝ)
Zeros 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.6080766 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 6563 |
| Zeros (%) | 13.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.5032867 |
|---|---|
| Coefficient of variation (CV) | 0.57639668 |
| Kurtosis | -0.55800219 |
| Mean | 2.6080766 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.035057734 |
| Sum | 124194 |
| Variance | 2.2598709 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 21768 | |
| 4 | 15850 | |
| 0 | 6563 | 13.8% |
| 5 | 1481 | 3.1% |
| 6 | 1352 | 2.8% |
| 3 | 568 | 1.2% |
| 1 | 37 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 6563 | 13.8% |
| 1 | 37 | 0.1% |
| 2 | 21768 | |
| 3 | 568 | 1.2% |
| 4 | 15850 | |
| 5 | 1481 | 3.1% |
| 6 | 1352 | 2.8% |
| Value | Count | Frequency (%) |
| 6 | 1352 | 2.8% |
| 5 | 1481 | 3.1% |
| 4 | 15850 | |
| 3 | 568 | 1.2% |
| 2 | 21768 | |
| 1 | 37 | 0.1% |
| 0 | 6563 | 13.8% |
occupation
Real number (ℝ)
Zeros 
| Distinct | 15 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.4396984 |
| Minimum | 0 |
|---|---|
| Maximum | 14 |
| Zeros | 5559 |
| Zeros (%) | 11.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.3514722 |
|---|---|
| Coefficient of variation (CV) | 0.67572608 |
| Kurtosis | -1.2661807 |
| Mean | 6.4396984 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.1109995 |
| Sum | 306652 |
| Variance | 18.93531 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10 | 6129 | |
| 3 | 6024 | |
| 2 | 5990 | |
| 0 | 5559 | |
| 12 | 5442 | |
| 7 | 4715 | |
| 6 | 2869 | |
| 8 | 2639 | |
| 14 | 2294 | 4.8% |
| 5 | 1975 | 4.1% |
| Other values (5) | 3983 |
| Value | Count | Frequency (%) |
| 0 | 5559 | |
| 1 | 15 | < 0.1% |
| 2 | 5990 | |
| 3 | 6024 | |
| 4 | 1353 | 2.8% |
| 5 | 1975 | 4.1% |
| 6 | 2869 | |
| 7 | 4715 | |
| 8 | 2639 | |
| 9 | 198 | 0.4% |
| Value | Count | Frequency (%) |
| 14 | 2294 | 4.8% |
| 13 | 1444 | 3.0% |
| 12 | 5442 | |
| 11 | 973 | 2.0% |
| 10 | 6129 | |
| 9 | 198 | 0.4% |
| 8 | 2639 | |
| 7 | 4715 | |
| 6 | 2869 | |
| 5 | 1975 | 4.1% |
relationship
Real number (ℝ)
High correlation  Zeros 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.4501355 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 19172 |
| Zeros (%) | 40.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.6049031 |
|---|---|
| Coefficient of variation (CV) | 1.1067263 |
| Kurtosis | -0.77177702 |
| Mean | 1.4501355 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.78236617 |
| Sum | 69054 |
| Variance | 2.5757139 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 19172 | |
| 1 | 12229 | |
| 3 | 7520 | 15.8% |
| 4 | 4998 | 10.5% |
| 5 | 2291 | 4.8% |
| 2 | 1409 | 3.0% |
| Value | Count | Frequency (%) |
| 0 | 19172 | |
| 1 | 12229 | |
| 2 | 1409 | 3.0% |
| 3 | 7520 | 15.8% |
| 4 | 4998 | 10.5% |
| 5 | 2291 | 4.8% |
| Value | Count | Frequency (%) |
| 5 | 2291 | 4.8% |
| 4 | 4998 | 10.5% |
| 3 | 7520 | 15.8% |
| 2 | 1409 | 3.0% |
| 1 | 12229 | |
| 0 | 19172 |
race
Categorical
Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 744.0 KiB |
| 4 | |
|---|---|
| 2 | |
| 1 | 1464 |
| 0 | 461 |
| 3 | 367 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 4 |
| 3rd row | 4 |
| 4th row | 2 |
| 5th row | 4 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 40739 | |
| 2 | 4588 | 9.6% |
| 1 | 1464 | 3.1% |
| 0 | 461 | 1.0% |
| 3 | 367 | 0.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4 | 40739 | |
| 2 | 4588 | 9.6% |
| 1 | 1464 | 3.1% |
| 0 | 461 | 1.0% |
| 3 | 367 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 4 | 40739 | |
| 2 | 4588 | 9.6% |
| 1 | 1464 | 3.1% |
| 0 | 461 | 1.0% |
| 3 | 367 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 47619 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 4 | 40739 | |
| 2 | 4588 | 9.6% |
| 1 | 1464 | 3.1% |
| 0 | 461 | 1.0% |
| 3 | 367 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 47619 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 4 | 40739 | |
| 2 | 4588 | 9.6% |
| 1 | 1464 | 3.1% |
| 0 | 461 | 1.0% |
| 3 | 367 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 47619 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 4 | 40739 | |
| 2 | 4588 | 9.6% |
| 1 | 1464 | 3.1% |
| 0 | 461 | 1.0% |
| 3 | 367 | 0.8% |
gender
Categorical
High correlation 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 744.0 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 31759 | |
| 0 | 15860 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 31759 | |
| 0 | 15860 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 31759 | |
| 0 | 15860 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 47619 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 31759 | |
| 0 | 15860 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 47619 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 31759 | |
| 0 | 15860 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 47619 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 31759 | |
| 0 | 15860 |
capital-gain
Real number (ℝ)
Zeros 
| Distinct | 122 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1088.8692 |
| Minimum | 0 |
|---|---|
| Maximum | 99999 |
| Zeros | 43674 |
| Zeros (%) | 91.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 5013 |
| Maximum | 99999 |
| Range | 99999 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 7495.3827 |
|---|---|
| Coefficient of variation (CV) | 6.8836392 |
| Kurtosis | 151.0297 |
| Mean | 1088.8692 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 11.83411 |
| Sum | 51850862 |
| Variance | 56180761 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 43674 | |
| 15024 | 513 | 1.1% |
| 7688 | 408 | 0.9% |
| 7298 | 361 | 0.8% |
| 99999 | 241 | 0.5% |
| 3103 | 151 | 0.3% |
| 5178 | 145 | 0.3% |
| 5013 | 117 | 0.2% |
| 4386 | 107 | 0.2% |
| 8614 | 82 | 0.2% |
| Other values (112) | 1820 | 3.8% |
| Value | Count | Frequency (%) |
| 0 | 43674 | |
| 114 | 8 | < 0.1% |
| 401 | 3 | < 0.1% |
| 594 | 51 | 0.1% |
| 914 | 10 | < 0.1% |
| 991 | 5 | < 0.1% |
| 1055 | 37 | 0.1% |
| 1086 | 5 | < 0.1% |
| 1111 | 1 | < 0.1% |
| 1151 | 13 | < 0.1% |
| Value | Count | Frequency (%) |
| 99999 | 241 | |
| 41310 | 2 | < 0.1% |
| 34095 | 6 | < 0.1% |
| 27828 | 58 | 0.1% |
| 25236 | 14 | < 0.1% |
| 25124 | 6 | < 0.1% |
| 22040 | 1 | < 0.1% |
| 20051 | 41 | 0.1% |
| 18481 | 1 | < 0.1% |
| 15831 | 8 | < 0.1% |
capital-loss
Real number (ℝ)
Zeros 
| Distinct | 99 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 87.912619 |
| Minimum | 0 |
|---|---|
| Maximum | 4356 |
| Zeros | 45379 |
| Zeros (%) | 95.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 4356 |
| Range | 4356 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 403.19364 |
|---|---|
| Coefficient of variation (CV) | 4.5862999 |
| Kurtosis | 19.542084 |
| Mean | 87.912619 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.534775 |
| Sum | 4186311 |
| Variance | 162565.11 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 45379 | |
| 1902 | 303 | 0.6% |
| 1977 | 253 | 0.5% |
| 1887 | 231 | 0.5% |
| 2415 | 72 | 0.2% |
| 1485 | 71 | 0.1% |
| 1848 | 67 | 0.1% |
| 1590 | 62 | 0.1% |
| 1602 | 61 | 0.1% |
| 1740 | 58 | 0.1% |
| Other values (89) | 1062 | 2.2% |
| Value | Count | Frequency (%) |
| 0 | 45379 | |
| 155 | 1 | < 0.1% |
| 213 | 5 | < 0.1% |
| 323 | 5 | < 0.1% |
| 419 | 3 | < 0.1% |
| 625 | 17 | < 0.1% |
| 653 | 4 | < 0.1% |
| 810 | 2 | < 0.1% |
| 880 | 6 | < 0.1% |
| 974 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 4356 | 1 | < 0.1% |
| 3900 | 2 | < 0.1% |
| 3770 | 4 | < 0.1% |
| 3683 | 2 | < 0.1% |
| 3175 | 2 | < 0.1% |
| 3004 | 5 | < 0.1% |
| 2824 | 14 | |
| 2754 | 2 | < 0.1% |
| 2603 | 4 | < 0.1% |
| 2559 | 17 |
hours-per-week
Real number (ℝ)
| Distinct | 96 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.564733 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 40 |
| median | 40 |
| Q3 | 45 |
| 95-th percentile | 60 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 12.304123 |
|---|---|
| Coefficient of variation (CV) | 0.30332069 |
| Kurtosis | 3.0028944 |
| Mean | 40.564733 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.26442778 |
| Sum | 1931652 |
| Variance | 151.39143 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 40 | 22243 | |
| 50 | 4201 | 8.8% |
| 45 | 2679 | 5.6% |
| 60 | 2160 | 4.5% |
| 35 | 1881 | 4.0% |
| 20 | 1781 | 3.7% |
| 30 | 1646 | 3.5% |
| 55 | 1037 | 2.2% |
| 25 | 921 | 1.9% |
| 48 | 759 | 1.6% |
| Other values (86) | 8311 | 17.5% |
| Value | Count | Frequency (%) |
| 1 | 23 | < 0.1% |
| 2 | 43 | 0.1% |
| 3 | 49 | 0.1% |
| 4 | 75 | 0.2% |
| 5 | 84 | 0.2% |
| 6 | 82 | 0.2% |
| 7 | 41 | 0.1% |
| 8 | 202 | |
| 9 | 27 | 0.1% |
| 10 | 396 |
| Value | Count | Frequency (%) |
| 99 | 134 | |
| 98 | 14 | < 0.1% |
| 97 | 2 | < 0.1% |
| 96 | 7 | < 0.1% |
| 95 | 2 | < 0.1% |
| 94 | 1 | < 0.1% |
| 92 | 3 | < 0.1% |
| 91 | 3 | < 0.1% |
| 90 | 42 | 0.1% |
| 89 | 3 | < 0.1% |
native-country
Real number (ℝ)
Zeros 
| Distinct | 42 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 36.928159 |
| Minimum | 0 |
|---|---|
| Maximum | 41 |
| Zeros | 824 |
| Zeros (%) | 1.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 744.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 39 |
| median | 39 |
| Q3 | 39 |
| 95-th percentile | 39 |
| Maximum | 41 |
| Range | 41 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 7.5771962 |
|---|---|
| Coefficient of variation (CV) | 0.20518749 |
| Kurtosis | 14.203959 |
| Mean | 36.928159 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.8834254 |
| Sum | 1758482 |
| Variance | 57.413903 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 39 | 43234 | |
| 0 | 824 | 1.7% |
| 26 | 601 | 1.3% |
| 30 | 276 | 0.6% |
| 11 | 206 | 0.4% |
| 2 | 177 | 0.4% |
| 33 | 170 | 0.4% |
| 19 | 150 | 0.3% |
| 5 | 123 | 0.3% |
| 9 | 122 | 0.3% |
| Other values (32) | 1736 | 3.6% |
| Value | Count | Frequency (%) |
| 0 | 824 | |
| 1 | 26 | 0.1% |
| 2 | 177 | 0.4% |
| 3 | 116 | 0.2% |
| 4 | 81 | 0.2% |
| 5 | 123 | 0.3% |
| 6 | 85 | 0.2% |
| 7 | 41 | 0.1% |
| 8 | 105 | 0.2% |
| 9 | 122 | 0.3% |
| Value | Count | Frequency (%) |
| 41 | 22 | < 0.1% |
| 40 | 80 | 0.2% |
| 39 | 43234 | |
| 38 | 26 | 0.1% |
| 37 | 29 | 0.1% |
| 36 | 65 | 0.1% |
| 35 | 111 | 0.2% |
| 34 | 20 | < 0.1% |
| 33 | 170 | 0.4% |
| 32 | 55 | 0.1% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.7566098 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | <=50K |
|---|---|
| 2nd row | <=50K |
| 3rd row | >50K |
| 4th row | >50K |
| 5th row | <=50K |
Common Values
| Value | Count | Frequency (%) |
| <=50K | 36029 | |
| >50K | 11590 | 24.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 50k | 47619 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 47619 | |
| 5 | 47619 | |
| K | 47619 | |
| < | 36029 | |
| = | 36029 | |
| > | 11590 | 5.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 226505 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 47619 | |
| 5 | 47619 | |
| K | 47619 | |
| < | 36029 | |
| = | 36029 | |
| > | 11590 | 5.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 226505 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 47619 | |
| 5 | 47619 | |
| K | 47619 | |
| < | 36029 | |
| = | 36029 | |
| > | 11590 | 5.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 226505 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 47619 | |
| 5 | 47619 | |
| K | 47619 | |
| < | 36029 | |
| = | 36029 | |
| > | 11590 | 5.1% |
Interactions
Correlations
| age | capital-gain | capital-loss | educational-num | fnlwgt | gender | hours-per-week | income | marital-status | native-country | occupation | race | relationship | workclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age | 1.000 | 0.126 | 0.059 | 0.082 | -0.075 | 0.131 | 0.167 | 0.322 | -0.389 | 0.008 | -0.007 | 0.031 | -0.324 | 0.065 |
| capital-gain | 0.126 | 1.000 | -0.067 | 0.118 | -0.007 | 0.049 | 0.094 | 0.269 | -0.076 | 0.012 | 0.015 | 0.014 | -0.101 | 0.027 |
| capital-loss | 0.059 | -0.067 | 1.000 | 0.077 | 0.000 | 0.065 | 0.059 | 0.198 | -0.042 | 0.005 | 0.016 | 0.014 | -0.064 | 0.010 |
| educational-num | 0.082 | 0.118 | 0.077 | 1.000 | -0.018 | 0.075 | 0.162 | 0.361 | -0.062 | -0.006 | 0.108 | 0.065 | -0.100 | 0.030 |
| fnlwgt | -0.075 | -0.007 | 0.000 | -0.018 | 1.000 | 0.025 | -0.019 | 0.010 | 0.038 | -0.059 | -0.001 | 0.070 | 0.014 | -0.031 |
| gender | 0.131 | 0.049 | 0.065 | 0.075 | 0.025 | 1.000 | 0.244 | 0.218 | 0.459 | 0.030 | 0.381 | 0.115 | 0.647 | 0.153 |
| hours-per-week | 0.167 | 0.094 | 0.059 | 0.162 | -0.019 | 0.244 | 1.000 | 0.269 | -0.207 | 0.010 | 0.013 | 0.059 | -0.309 | 0.119 |
| income | 0.322 | 0.269 | 0.198 | 0.361 | 0.010 | 0.218 | 0.269 | 1.000 | 0.455 | 0.066 | 0.317 | 0.101 | 0.460 | 0.179 |
| marital-status | -0.389 | -0.076 | -0.042 | -0.062 | 0.038 | 0.459 | -0.207 | 0.455 | 1.000 | -0.025 | 0.021 | 0.083 | 0.314 | -0.064 |
| native-country | 0.008 | 0.012 | 0.005 | -0.006 | -0.059 | 0.030 | 0.010 | 0.066 | -0.025 | 1.000 | -0.006 | 0.267 | -0.013 | -0.010 |
| occupation | -0.007 | 0.015 | 0.016 | 0.108 | -0.001 | 0.381 | 0.013 | 0.317 | 0.021 | -0.006 | 1.000 | 0.072 | -0.042 | -0.032 |
| race | 0.031 | 0.014 | 0.014 | 0.065 | 0.070 | 0.115 | 0.059 | 0.101 | 0.083 | 0.267 | 0.072 | 1.000 | 0.099 | 0.057 |
| relationship | -0.324 | -0.101 | -0.064 | -0.100 | 0.014 | 0.647 | -0.309 | 0.460 | 0.314 | -0.013 | -0.042 | 0.099 | 1.000 | -0.110 |
| workclass | 0.065 | 0.027 | 0.010 | 0.030 | -0.031 | 0.153 | 0.119 | 0.179 | -0.064 | -0.010 | -0.032 | 0.057 | -0.110 | 1.000 |
Missing values
Sample
| age | workclass | fnlwgt | educational-num | marital-status | occupation | relationship | race | gender | capital-gain | capital-loss | hours-per-week | native-country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 3 | 226802 | 7 | 4 | 6 | 3 | 2 | 1 | 0 | 0 | 40 | 39 | <=50K |
| 1 | 38 | 3 | 89814 | 9 | 2 | 4 | 0 | 4 | 1 | 0 | 0 | 50 | 39 | <=50K |
| 2 | 28 | 1 | 336951 | 12 | 2 | 11 | 0 | 4 | 1 | 0 | 0 | 40 | 39 | >50K |
| 3 | 44 | 3 | 160323 | 10 | 2 | 6 | 0 | 2 | 1 | 7688 | 0 | 40 | 39 | >50K |
| 4 | 18 | 2 | 103497 | 10 | 4 | 8 | 3 | 4 | 0 | 0 | 0 | 30 | 39 | <=50K |
| 5 | 34 | 3 | 198693 | 6 | 4 | 7 | 1 | 4 | 1 | 0 | 0 | 30 | 39 | <=50K |
| 6 | 29 | 2 | 227026 | 9 | 4 | 8 | 4 | 2 | 1 | 0 | 0 | 40 | 39 | <=50K |
| 7 | 63 | 5 | 104626 | 15 | 2 | 10 | 0 | 4 | 1 | 3103 | 0 | 32 | 39 | >50K |
| 8 | 24 | 3 | 369667 | 10 | 4 | 7 | 4 | 4 | 0 | 0 | 0 | 40 | 39 | <=50K |
| 9 | 55 | 3 | 104996 | 4 | 2 | 2 | 0 | 4 | 1 | 0 | 0 | 10 | 39 | <=50K |
| age | workclass | fnlwgt | educational-num | marital-status | occupation | relationship | race | gender | capital-gain | capital-loss | hours-per-week | native-country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48832 | 32 | 3 | 34066 | 6 | 2 | 5 | 0 | 0 | 1 | 0 | 0 | 40 | 39 | <=50K |
| 48833 | 43 | 3 | 84661 | 11 | 2 | 12 | 0 | 4 | 1 | 0 | 0 | 45 | 39 | <=50K |
| 48834 | 32 | 3 | 116138 | 14 | 4 | 13 | 1 | 1 | 1 | 0 | 0 | 11 | 36 | <=50K |
| 48835 | 53 | 3 | 321865 | 14 | 2 | 3 | 0 | 4 | 1 | 0 | 0 | 40 | 39 | >50K |
| 48836 | 22 | 3 | 310152 | 10 | 4 | 11 | 1 | 4 | 1 | 0 | 0 | 40 | 39 | <=50K |
| 48837 | 27 | 3 | 257302 | 12 | 2 | 13 | 5 | 4 | 0 | 0 | 0 | 38 | 39 | <=50K |
| 48838 | 40 | 3 | 154374 | 9 | 2 | 6 | 0 | 4 | 1 | 0 | 0 | 40 | 39 | >50K |
| 48839 | 58 | 3 | 151910 | 9 | 6 | 0 | 4 | 4 | 0 | 0 | 0 | 40 | 39 | <=50K |
| 48840 | 22 | 3 | 201490 | 9 | 4 | 0 | 3 | 4 | 1 | 0 | 0 | 20 | 39 | <=50K |
| 48841 | 52 | 4 | 287927 | 9 | 2 | 3 | 5 | 4 | 0 | 15024 | 0 | 40 | 39 | >50K |
Duplicate rows
Most frequently occurring
| age | workclass | fnlwgt | educational-num | marital-status | occupation | relationship | race | gender | capital-gain | capital-loss | hours-per-week | native-country | income | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 25 | 3 | 308144 | 13 | 4 | 2 | 1 | 4 | 1 | 0 | 0 | 40 | 26 | <=50K | 3 |
| 0 | 17 | 3 | 153021 | 8 | 4 | 12 | 3 | 4 | 0 | 0 | 0 | 20 | 39 | <=50K | 2 |
| 1 | 18 | 4 | 378036 | 8 | 4 | 4 | 3 | 4 | 1 | 0 | 0 | 10 | 39 | <=50K | 2 |
| 2 | 19 | 2 | 167428 | 10 | 4 | 8 | 3 | 4 | 1 | 0 | 0 | 40 | 39 | <=50K | 2 |
| 3 | 19 | 3 | 97261 | 9 | 4 | 4 | 1 | 4 | 1 | 0 | 0 | 40 | 39 | <=50K | 2 |
| 4 | 19 | 3 | 138153 | 10 | 4 | 0 | 3 | 4 | 0 | 0 | 0 | 10 | 39 | <=50K | 2 |
| 5 | 19 | 3 | 139466 | 10 | 4 | 12 | 3 | 4 | 0 | 0 | 0 | 25 | 39 | <=50K | 2 |
| 6 | 19 | 3 | 146679 | 10 | 4 | 3 | 3 | 2 | 1 | 0 | 0 | 30 | 39 | <=50K | 2 |
| 7 | 19 | 3 | 251579 | 10 | 4 | 7 | 3 | 4 | 1 | 0 | 0 | 14 | 39 | <=50K | 2 |
| 8 | 19 | 3 | 318822 | 9 | 4 | 0 | 1 | 4 | 0 | 0 | 0 | 40 | 39 | <=50K | 2 |